# ImageNet Pretrained

Mar Vae Kl16
MIT
This is a KL16 variational autoencoder (VAE) model trained on the ImageNet-1k dataset for image-to-image conversion tasks.
Image Generation
M
xwen99
81
0
Pvt Tiny 224
Apache-2.0
Pyramid Vision Transformer (PVT) is a vision model based on transformer architecture, specifically designed for image classification tasks.
Image Classification Transformers
P
Xrenya
25
0
Efficientnet B6
Apache-2.0
EfficientNet is a mobile-friendly pure convolutional model that uniformly scales depth/width/resolution dimensions through compound coefficients, trained on the ImageNet-1k dataset.
Image Classification Transformers
E
google
167
0
Efficientnet B1
Apache-2.0
EfficientNet is a mobile-friendly pure convolutional neural network that achieves efficient scaling by uniformly adjusting depth/width/resolution dimensions through compound coefficients.
Image Classification Transformers
E
google
1,868
1
Mobilenet V2 1.4 224
Other
MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.
Image Classification Transformers
M
Matthijs
26
0
Mobilenet V2 1.0 224
Other
MobileNet V2 is a lightweight convolutional neural network designed for mobile devices, excelling in image classification tasks.
Image Classification Transformers
M
Matthijs
29
0
Mobilenet V1 1.0 224
Other
MobileNet V1 is a lightweight convolutional neural network designed for mobile and embedded vision applications, pretrained on the ImageNet-1k dataset.
Image Classification Transformers
M
Matthijs
41
0
Levit 128S
Apache-2.0
LeViT-128S is a vision Transformer model pretrained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference.
Image Classification Transformers
L
facebook
3,198
4
Levit 128
Apache-2.0
LeViT-128 is an image classification model based on the Vision Transformer architecture, achieving efficient inference by combining the advantages of convolutional networks.
Image Classification Transformers
L
facebook
44
0
Levit 256
Apache-2.0
LeViT-256 is an efficient vision model based on Transformer architecture, designed for fast inference and pretrained on the ImageNet-1k dataset.
Image Classification Transformers
L
facebook
37
0
Levit 384
Apache-2.0
LeViT-384 is a vision Transformer model pre-trained on the ImageNet-1k dataset, combining the advantages of convolutional networks for faster inference speed.
Image Classification Transformers
L
facebook
37
0
Cvt 21 384
Apache-2.0
CvT-21 is an image classification model based on the Convolutional Vision Transformer architecture, pretrained on the ImageNet-1k dataset at a resolution of 384x384.
Image Classification Transformers
C
microsoft
29
1
Cvt 13 384
Apache-2.0
CvT-13 is a vision transformer model pre-trained on the ImageNet-1k dataset, improving the performance of traditional vision transformers by introducing convolutional operations.
Image Classification Transformers
C
microsoft
27
0
Regnet Y 008
Apache-2.0
RegNet model trained on ImageNet-1k, an efficient vision model designed through neural architecture search
Image Classification Transformers
R
facebook
22
0
Regnet X 320
Apache-2.0
RegNet model trained on ImageNet-1k, an efficient vision model designed through neural architecture search
Image Classification Transformers
R
facebook
31
0
Resnet 101
Apache-2.0
Deep residual network model pretrained on the ImageNet-1k dataset, using the improved v1.5 architecture
Image Classification Transformers
R
microsoft
4,659
17
Resnet 34
Apache-2.0
ResNet-34 is a convolutional neural network based on residual learning, designed for image classification tasks and pretrained on the ImageNet-1k dataset.
Image Classification Transformers
R
microsoft
4,355
9
Swin Small Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical window-based vision Transformer model designed for image classification tasks, with computational complexity linearly related to input image size.
Image Classification Transformers
S
microsoft
2,028
1
Swin Tiny Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification tasks.
Image Classification Transformers
S
microsoft
98.00k
42
Deit Tiny Distilled Patch16 224
Apache-2.0
This model is a distilled version of the Data-efficient image Transformer (DeiT), pretrained and fine-tuned on ImageNet-1k at 224x224 resolution, efficiently learning from a teacher model through distillation.
Image Classification Transformers
D
facebook
6,016
6
Swin Large Patch4 Window7 224
Apache-2.0
Swin Transformer is a hierarchical vision Transformer that achieves linear computational complexity by computing self-attention within local windows, making it suitable for image classification and dense recognition tasks.
Image Classification Transformers
S
microsoft
2,079
1
Deit Base Patch16 224
Apache-2.0
DeiT is a data-efficient image Transformer model trained with attention mechanisms, pretrained and fine-tuned on the ImageNet-1k dataset at 224x224 resolution.
Image Classification Transformers
D
facebook
152.63k
13
Deit Tiny Patch16 224
Apache-2.0
DeiT is an efficiently trained vision Transformer model, pretrained and fine-tuned on the ImageNet-1k dataset, suitable for image classification tasks.
Image Classification Transformers
D
facebook
29.04k
9
Swin Base Patch4 Window12 384
Apache-2.0
Swin Transformer is a hierarchical vision transformer based on shifted windows, specifically designed for image classification tasks, with computational complexity linear to input image size.
Image Classification Transformers
S
microsoft
1,421
4
Vit Base Patch16 224 In21k
Apache-2.0
A Vision Transformer model pretrained on the ImageNet-21k dataset for image classification tasks.
Image Classification
V
google
2.2M
323
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase